SSDB: Sequence Similarity Database in KEGG

نویسندگان

Yoko Sato

Akihiro Nakaya

Kotaro Shiraishi

Shuichi Kawashima

Susumu Goto

Minoru Kanehisa

چکیده

Availability of a large number of complete genomes enables us to compare several genomes and to search common and different features between genomes in terms of protein sequence similarities, which we call comparative genomics. It produces information about proteins useful for the assignment of the function to genes and for the research on the evolution of the genome. The large number of genes accumulated in the databases of complete genomes, however, has become a bottleneck, because the computation of the sequence similarity of all pairs of proteins is time consuming even if we use a supercomputer. Therefore precomputed sequence similarities of completely sequenced organisms are indispensable for comparative genomics. SSDB (Sequence Similarity Database) is a new addition to the KEGG suite of databases [3] and contains the information about amino acid sequence similarities among all protein-coding genes in the complete genomes, together with the information about best hits and bidirectional best hits (best-best hits). The relation of gene x in genome A and gene y in genome B is called bidirectional best hits, when x is the best hit of query y against all genes in A and vice versa, and it is often used as an operational definition of ortholog. We report here the system design and simple search capabilities of SSDB.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Protein Sequences into Paralog and Ortholog Clusters Using Sequence Similarity Profiles of KEGG/SSDB

We are constructing KEGG/OC (Ortholog Clusters) from KEGG/SSDB (Sequence Similarity DataBase) [2]. KEGG/SSDB contains exhaustive protein sequence similarity scores of completed and nearly completed genomes calculated by the SSEARCH program [3]. KEGG/OC is constructed automatically from the graph analysis of searching cliques with an appropriate definition for the profiles of similarity scores. ...

متن کامل

Automatic generation of KEGG OC (Ortholog Cluster) and its assignment to draft genomes

As the number of sequenced genomes are rapidly growing, a method for automatic generation of orthologous gene clusters is needed. However, it is computationally hard to cluster a large number of genes at once. To address this problem, we have developed a heuristic method to assign gene groups from closely related organisms to an ortholog cluster in a bottom-up approach. In this method, we consi...

متن کامل

The KEGG databases at GenomeNet

The Kyoto Encyclopedia of Genes and Genomes (KEGG) is the primary database resource of the Japanese GenomeNet service (http://www.genome.ad.jp/) for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. KEGG consists of the PATHWAY database for the computerized knowledge on molecular interaction networks such as pathways and comple...

متن کامل

Identification of Ortholog Groups in KEGG/SSDB by Considering Domain Structures

Huge amount of genome information is stored in databases with the advent of recent genome projects. Although we can effectively predict protein sequences from these genomes, functions of most proteins are not experimentally determined. Therefore computational methods are most important for the function prediction, based on comparison and clustering of protein sequences. However, complications a...

متن کامل

Additional File 2 – the Biological Support of the Gene Regulations for Yeast Cell Cycling. Knowledge Databases Kegg Database Sgd and Cygd Databases Results and Discussions

KEGG database The KEGG [1, 2] is a suite of databases and associated software that integrates current knowledge on molecular interaction networks in biological processes (PATHWAY database), the information about the universe of genes and proteins (GENES/SSDB/KO databases), and the information about the universe of chemical compounds, drugs and their biochemical reactions (COMPOUND/DRUG/GLYCAN/R...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

SSDB: Sequence Similarity Database in KEGG

نویسندگان

چکیده

منابع مشابه

Classification of Protein Sequences into Paralog and Ortholog Clusters Using Sequence Similarity Profiles of KEGG/SSDB

Automatic generation of KEGG OC (Ortholog Cluster) and its assignment to draft genomes

The KEGG databases at GenomeNet

Identification of Ortholog Groups in KEGG/SSDB by Considering Domain Structures

Additional File 2 – the Biological Support of the Gene Regulations for Yeast Cell Cycling. Knowledge Databases Kegg Database Sgd and Cygd Databases Results and Discussions

عنوان ژورنال:

اشتراک گذاری